首页> 外文OA文献 >Long-term Recurrent Convolutional Networks for Visual Recognition and Description

【2h】

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

机译：用于视觉识别和识别的长期递归卷积网络描述

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Models based on deep convolutional networks have dominated recent imageinterpretation tasks; we investigate whether models which are also recurrent,or "temporally deep", are effective for tasks involving sequences, visual andotherwise. We develop a novel recurrent convolutional architecture suitable forlarge-scale visual learning which is end-to-end trainable, and demonstrate thevalue of these models on benchmark video recognition tasks, image descriptionand retrieval problems, and video narration challenges. In contrast to currentmodels which assume a fixed spatio-temporal receptive field or simple temporalaveraging for sequential processing, recurrent convolutional models are "doublydeep"' in that they can be compositional in spatial and temporal "layers". Suchmodels may have advantages when target concepts are complex and/or trainingdata are limited. Learning long-term dependencies is possible whennonlinearities are incorporated into the network state updates. Long-term RNNmodels are appealing in that they directly can map variable-length inputs(e.g., video frames) to variable length outputs (e.g., natural language text)and can model complex temporal dynamics; yet they can be optimized withbackpropagation. Our recurrent long-term models are directly connected tomodern visual convnet models and can be jointly trained to simultaneously learntemporal dynamics and convolutional perceptual representations. Our resultsshow such models have distinct advantages over state-of-the-art models forrecognition or generation which are separately defined and/or optimized.

机译：基于深度卷积网络的模型主导了最近的图像解释任务。我们研究了周期性或“暂时深度”的模型对于涉及序列（视觉或其他）的任务是否有效。我们开发了一种适用于端到端可训练的大规模视觉学习的新颖循环卷积体系结构，并演示了这些模型在基准视频识别任务，图像描述和检索问题以及视频旁白挑战方面的价值。与当前模型假定一个固定的时空接受域或简单的时间平均以进行顺序处理相反，循环卷积模型是“双重深度”，因为它们可以在空间和时间“层”中构成。当目标概念复杂和/或训练数据有限时，此类模型可能具有优势。当非线性被合并到网络状态更新中时，学习长期依赖性是可能的。长期的RNN模型很吸引人，因为它们可以直接将可变长度的输入（例如视频帧）映射到可变长度的输出（例如自然语言文本），并且可以对复杂的时间动态建模;但是可以通过反向传播对其进行优化。我们的循环长期模型直接连接到现代视觉卷积模型，可以共同训练以同时学习时空动力学和卷积感知表示。我们的结果表明，与单独定义和/或优化的最先进的识别或生成模型相比，此类模型具有明显的优势。

著录项

作者
Donahue, Jeff; Hendricks, Lisa Anne; Rohrbach, Marcus; Venugopalan, Subhashini; Guadarrama, Sergio; Saenko, Kate; Darrell, Trevor;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J] . Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, IEEE Transactions on Pattern Analysis and Machine Intelligence . 2017,第4期

机译：视觉识别和描述的长期递归卷积网络
2. 3D long-term recurrent convolutional networks for human sub-assembly recognition in human-robot collaboration [J] . Xianhe Wen, Heping Chen Assembly Automation . 2020,第4期

机译：3D人体机器人协作中的人类子组装识别的长期经常性卷积网络
3. ARCH: Adaptive recurrent-convolutional hybrid networks for long-term action recognition [J] . Xin Miao, Zhang Hong, Wang Helong, Neurocomputing . 2016,第Feba20期

机译：ARCH：用于长期动作识别的自适应递归卷积混合网络
4. Long-term recurrent convolutional networks for visual recognition and description [C] . Donahue Jeff, Hendricks Lisa Anne, Guadarrama Sergio, IEEE Conference on Computer Vision and Pattern Recognition . 2015

机译：用于视觉识别和描述的长期经常性卷积网络
5. Cortex-inspired goal-directed recurrent networks for developmental visual attention and recognition with complex backgrounds [D] . Luciw, Matthew 2010

机译：皮质启发式目标导向的递归网络，用于在复杂背景下发展视觉注意力和识别
6. ARCH: Adaptive recurrent-convolutional hybrid networks for long-term action recognition [O] . Miao Xin, Hong Zhang, Helong Wang, -1

机译：ARCH：用于长期动作识别的自适应递归卷积混合网络
7. Long-term recurrent convolutional networks for visual recognition and description [O] . Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, 2015

机译：用于视觉识别和描述的长期经常性卷积网络
8. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. [R] . Donahue, J., Hendricks, L. A., Guadarrama, S., 2014

机译：用于视觉识别和描述的长期循环卷积网络。

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

摘要

著录项

相似文献

相关主题

期刊订阅